1,389 research outputs found

    Input Fast-Forwarding for Better Deep Learning

    Full text link
    This paper introduces a new architectural framework, known as input fast-forwarding, that can enhance the performance of deep networks. The main idea is to incorporate a parallel path that sends representations of input values forward to deeper network layers. This scheme is substantially different from "deep supervision" in which the loss layer is re-introduced to earlier layers. The parallel path provided by fast-forwarding enhances the training process in two ways. First, it enables the individual layers to combine higher-level information (from the standard processing path) with lower-level information (from the fast-forward path). Second, this new architecture reduces the problem of vanishing gradients substantially because the fast-forwarding path provides a shorter route for gradient backpropagation. In order to evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet), with 20 convolutional layers along with parallel fast-forward paths, has been created and tested. The paper presents empirical results that demonstrate improved learning capacity of FFNet due to fast-forwarding, as compared to GoogLeNet (with deep supervision) and CaffeNet, which are 4x and 18x larger in size, respectively. All of the source code and deep learning models described in this paper will be made available to the entire research communityComment: Accepted in the 14th International Conference on Image Analysis and Recognition (ICIAR) 2017, Montreal, Canad

    Using the Forest to See the Trees: Exploiting Context for Visual Object Detection and Localization

    Get PDF
    Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been much progress and there are already object recognition systems operating in commercial products. However, most of the algorithms for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem by brute force. In the real world, objects tend to covary with other objects, providing a rich collection of contextual associations. These contextual associations can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places. Most modeling attempts so far have defined the context of an object in terms of other previously recognized objects. The drawback of this approach is that inferring the context becomes as difficult as detecting each object. An alternative view of context relies on using the entire scene information holistically. This approach is algorithmically attractive since it dispenses with the need for a prior step of individual object recognition. In this paper, we use a probabilistic framework for encoding the relationships between context and object properties and we show how an integrated system provides improved performance. We view this as a significant step toward general purpose machine vision systems.United States. National Geospatial-Intelligence Agency (NEGI-1582-04-0004)United States. Army Research Office. Multidisciplinary University Research Initiative (Grant Number N00014-06-1-0734)National Science Foundation (U.S.). (Contract IIS-0413232)National Defense Science and Engineering Graduate Fellowshi

    Multi-utility Learning: Structured-output Learning with Multiple Annotation-specific Loss Functions

    Full text link
    Structured-output learning is a challenging problem; particularly so because of the difficulty in obtaining large datasets of fully labelled instances for training. In this paper we try to overcome this difficulty by presenting a multi-utility learning framework for structured prediction that can learn from training instances with different forms of supervision. We propose a unified technique for inferring the loss functions most suitable for quantifying the consistency of solutions with the given weak annotation. We demonstrate the effectiveness of our framework on the challenging semantic image segmentation problem for which a wide variety of annotations can be used. For instance, the popular training datasets for semantic segmentation are composed of images with hard-to-generate full pixel labellings, as well as images with easy-to-obtain weak annotations, such as bounding boxes around objects, or image-level labels that specify which object categories are present in an image. Experimental evaluation shows that the use of annotation-specific loss functions dramatically improves segmentation accuracy compared to the baseline system where only one type of weak annotation is used

    Lean manual assembly 4.0: A systematic review

    Get PDF
    In a demand context of mass customization, shifting towards the mass personalization of products, assembly operations face the trade-off between highly productive automated systems and flexible manual operators. Novel digital technologies—conceptualized as Industry 4.0—suggest the possibility of simultaneously achieving superior productivity and flexibility. This article aims to address how Industry 4.0 technologies could improve the productivity, flexibility and quality of assembly operations. A systematic literature review was carried out, including 234 peer-reviewed articles from 2010–2020. As a result, the analysis was structured addressing four sets of research questions regarding (1) assembly for mass customization; (2) Industry 4.0 and performance evaluation; (3) Lean production as a starting point for smart factories, and (4) the implications of Industry 4.0 for people in assembly operations. It was found that mass customization brings great complexity that needs to be addressed at different levels from a holistic point of view; that Industry 4.0 offers powerful tools to achieve superior productivity and flexibility in assembly; that Lean is a great starting point for implementing such changes; and that people need to be considered central to Assembly 4.0. Developing methodologies for implementing Industry 4.0 to achieve specific business goals remains an open research topic

    Motivation in adapted sport

    Get PDF
    This study examines the motivation for practice of sport of people with disabilities that form part to a federated sport.The sample was composed of 134 athletes of both genders and different disabilities.The “Participation Motivation Inventory Questionnaire” by Gill, Gross and Huddleston was used. The instrument was adapted to Paralympic sport and describes the main reasons that encourage the sports activity practice. The results haven´t found significant difference between men´s and women´s or between blind - visually impaired physical and motor disabilities. About the motivation of the practice of sport, worth highlighting the importance given to factors of fitness and health, like sport practice, improve the level, to compete, feel good and have fun, well above being popular, influenced by coaches or satisfy to parents

    Operator-centred Lean 4.0 framework for flexible assembly lines

    Get PDF
    This article provides a starting point for developing a methodology to successfully implement Industry 4.0 technology for assembly operations. It presents a novel multi-layer human-centred conceptual model in line with Lean philosophy which identifies the assembly operator functions and relates them to other production departments, identifying how they would be affected by incorporating new digital technologies. The model shows that assembly operators would only be directly supported by hardware digital technologies, while the production support departments would mainly employ Industry 4.0 software technologies. The work presented here paves the way for developing a methodology for implementing Lean Assembly 4.0

    Efficient On-the-fly Category Retrieval using ConvNets and GPUs

    Full text link
    We investigate the gains in precision and speed, that can be obtained by using Convolutional Networks (ConvNets) for on-the-fly retrieval - where classifiers are learnt at run time for a textual query from downloaded images, and used to rank large image or video datasets. We make three contributions: (i) we present an evaluation of state-of-the-art image representations for object category retrieval over standard benchmark datasets containing 1M+ images; (ii) we show that ConvNets can be used to obtain features which are incredibly performant, and yet much lower dimensional than previous state-of-the-art image representations, and that their dimensionality can be reduced further without loss in performance by compression using product quantization or binarization. Consequently, features with the state-of-the-art performance on large-scale datasets of millions of images can fit in the memory of even a commodity GPU card; (iii) we show that an SVM classifier can be learnt within a ConvNet framework on a GPU in parallel with downloading the new training images, allowing for a continuous refinement of the model as more images become available, and simultaneous training and ranking. The outcome is an on-the-fly system that significantly outperforms its predecessors in terms of: precision of retrieval, memory requirements, and speed, facilitating accurate on-the-fly learning and ranking in under a second on a single GPU.Comment: Published in proceedings of ACCV 201

    Background modeling for video sequences by stacked denoising autoencoders

    Get PDF
    Nowadays, the analysis and extraction of relevant information in visual data flows is of paramount importance. These images sequences can last for hours, which implies that the model must adapt to all kinds of circumstances so that the performance of the system does not decay over time. In this paper we propose a methodology for background modeling and foreground detection, whose main characteristic is its robustness against stationary noise. Thus, stacked denoising autoencoders are applied to generate a set of robust characteristics for each region or patch of the image, which will be the input of a probabilistic model to determine if that region is background or foreground. The evaluation of a set of heterogeneous sequences results in that, although our proposal is similar to the classical methods existing in the literature, the inclusion of noise in these sequences causes drastic performance drops in the competing methods, while in our case the performance stays or falls slightly.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
    • …
    corecore